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Abstract. We study turn-based quantitative multiplayer non zero-sum games played on 
finite graphs with reachability objectives. In such games, each player aims at reaching his 
own goal set of states as soon as possible. A previous work on this model showed that 
Nash equilibria (resp. secure equilibria) are guaranteed to exist in the multiplayer (resp. 
two-player) case. The existence of secure equilibria in the multiplayer case remained and is 
still an open problem. In this paper, we focus our study on the concept of subgame perfect 
equilibrium, a refinement of Nash equilibrium well-suited in the framework of games played 
on graphs. We also introduce the new concept of subgame perfect secure equilibrium. We 
prove the existence of subgame perfect equilibria (resp. subgame perfect secure equilibria) 
in multiplayer (resp. two-player) quantitative reachability games. Moreover, we provide 
an algorithm deciding the existence of secure equilibria in the multiplayer case. 



Introduction 

General framework. The construction of correct and efficient computer systems (hard- 
ware or software) is recognized as an extremely difficult task. To support the design and 
verification of such systems, mathematical logic, automata theory [HU79j and more recently 
mo del- checking [CGPOO] have been intensively studied. The efficiency of the model-checking 
approach is widely recognized when applied to systems that can be accurately modeled as 
a finite-state automaton. In contrast, the application of these techniques to more com- 
plex systems like embedded systems or distributed systems has been less successful. This 
could be partly explained by the following reasons: classical automata-based models do not 
faithfully capture the complex behavior of modern computational systems that are usually 
composed of several interacting components, also interacting with an environment that is 
only partially under control. One recent trend to improve the automata models used in the 
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classical approach of verification is to generalize these models with the more flexible and 
mathematically deeper game-theoretic framework [Nas50, OR94J. 

The first steps to extend computational models with concepts from game theory were 
done with the so-called two-player zero-sum games played on graphs [GTW 02J . Those 
games are adequate to model controller-environment interactions problems |Tho95l [ThoOS^ . 
Moves of player 1 model actions of the controller whereas moves of player 2 model the un- 
controllable actions of the environment, and a winning strategy for player 1 is an abstract 
form of a control program that enforces the control objective. However, only purely antag- 
onist interactions between a controller and a hostile environment can be modeled in this 
framework. In order to study more complex systems with more than two components and 
objectives that are not necessarily antagonist, we need multiplayer non zero-sum games. 
While in zero-sum games we look for winning or optimal strategies, in non-zero-sum games 
we rather try to find relevant notions of equilibria, like the famous notion of Nash equilib- 
rium |Nas50 j . The secure equilibrium [CHJ06J is a more recent concept that is especially 
well-suited for assume-guarantee synthesis }CH071 ICRlOj. 

There is another interesting extension in such games: moving from qualitative to quan- 
titative objectives. A player has a qualitative objective if his aim is to enforce some spec- 
ification (as, for instance, reaching a certain set of target states of the graph), whereas a 
quantitative objective implies that he wants to minimize or maximize his gain. For example, 
a player may wish to reach a set of target states quickly or with a minimal consumption 
of energy. Until now, qualitative objectives have been more studied than quantitative ob- 
jectives. However, the latter objectives are as much natural as the former, and so, aught 
to be considered. Consequently, we investigate here equilibria for multiplayer non zero-sum 
games played on graphs with quantitative objectives. This article provides some new results 
in this research direction, in particular it is another step in the quest for solution concepts 
well-suited for the computer-aided synthesis and verification of multi- agent systems. 

Our contribution. We study turn-based multiplayer non zero-sum games played on finite 
graphs with quantitative reachability objectives, continuing work initiated in [BBDP10] . In 
this framework each player aims at reaching his own goal as soon as possible. In [BBDP10], 
among other results, it has been proved that a finite-memory Nash (resp. secure) equilibria 
always exists in multiplayer (resp. 2-player) games. 

In this paper we consider alternative solution concepts to the classical notion of Nash 
equilibria. In particular, in the present framework of games on graphs, it is very natural 
to consider the notion of subgame perfect equilibrium [Sel65]: a choice of strategies is not 
only required to be a Nash equilibrium from the initial vertex, but also after every possible 
initial history of the game. Indeed if the initial state or the initial history of the system 
is not known, then a robust controller should be subgame perfect. We introduce a new 
and even stronger solution concept with the notion of subgame perfect secure equilibrium, 
which gathers both the sequential nature of subgame perfect equilibria and the verification- 
oriented aspects of secure equilibria. These different notions of equilibria are precisely 
defined in Section [TJ 

In this paper, we address the following problems: 

Problem 1. Given a multiplayer quantitative reachability game G, does there exist a Nash 
(resp. secure, subgame perfect, subgame perfect secure) equilibrium in G? 
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Problem 2. Given a Nash (resp. secure, subgame perfect, subgame perfect secure) equilib- 
rium in a multiplayer quantitative reachability game G, does there exist such an equilibrium 
with finite memory? 

These questions have been positively solved by some of the authors in [BBDP10] for 
Nash equilibria in multiplayer games, and for secure equilibria in two-player games. No- 
tice that these problems and related ones have been investigated a lot in the qualitative 
framework (see [GU08] ). 

Here we go a step further and establish the following results about subgame perfect 
and secure equilibria: 

• in every multiplayer quantitative reachability game, there exists a subgame perfect equi- 
librium (Theorem 12. ip . 

• in every two-player quantitative reachability game, there exists a subgame perfect secure 
equilibrium (Theorem I3.ip , 

• in every multiplayer quantitative reachability game, one can decide whether there exists 
a secure equilibrium in ExpSpace (Theorem 14. ip . 

• if there exists a secure equilibrium in a multiplayer quantitative reachability game, then 
there exists one that is finite- memory (Theorem 14. 2p . 

The results in this paper first appeared in the proceedings of FoSSaCS 2012, [BBDPG12J. 
We here provide their complete proofs. 

Related work. Several recent papers have considered two-player zero-sum games played 
on finite graphs with regular objectives enriched by some quantitative aspects. Let us 
mention some of them: games with finitary objectives [CH06], games with prioritized re- 
quirements [AKW08 , request-response games where the waiting times between the requests 
and the responses are minimized [HTW08, Zim09], and games whose winning conditions 
are expressed via quantitative languages [BCHJ09]. 

Other work concerns qualitative non zero-sum games. In [CHJ06 where the notion of 
secure equilibrium has been introduced, it is proved that a unique maximal payoff profile of 
secure equilibria always exists for two-player non zero-sum games with regular objectives. 
In [GU08J, general criteria ensuring existence of Nash equilibria and subgame perfect equi- 
libria (resp. secure equilibria) are provided for multiplayer (resp. 2-player) games, as well as 
complexity results. In [BBM10], the existence of Nash equilibria is studied for timed games 
with qualitative reachability objectives. Complexity issues are discussed in [BBMUllJ about 
Nash equilibria in multiplayer concurrent games with Biichi objectives. 

Finally, let us mention work that combines both quantitative and non zero-sum as- 
pects. In [BG09j . the authors study games played on graphs with terminal vertices where 
quantitative payoffs are assigned to the players. These games may have cycles but all the 
infinite plays form a single outcome (like in chess where every infinite play is a draw). That 
paper gives criteria that ensure the existence of Nash (and subgame perfect) equilibria in 
pure and memoryless strategies. In [KLST12] . the studied games are played on priced 
graphs similar to the ones considered in this article, however in a concurrent way. In this 
concurrent framework, Nash equilibria are not guaranteed to exist anymore. The authors 
provide an algorithm to decide existence of Nash equilibria, thanks to a Biichi automaton 
accepting all Nash equilibria outcomes. The complexity of some related decision problems 
is also studied. In [PS09 , the authors study Muller games on finite graphs where players 
have a preference ordering on the sets of the Muller table. They show that Nash equilibria 
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always exist for such games, and that it is decidable whether there exists a subgame perfect 
equilibrium. In both cases they g ive a procedure to compute an equilibrium strategy profile 
(when it exists). In [FK MY + 10| (respectively |PSllj ). it is shown that every multiplayer 
sequential game has a subgame-perfect e-equilibrium for every e > if the payoff functions 
of the players are bounded and lower-semicontinuous (respectively upper-semicontinuous). 



Organization of the paper. Section [T] is dedicated to definitions. We present the kinds 
of games and equilibria that we study in this paper. In Section [2j we positively solve 
Problem [1] for subgame perfect equilibria. In Section [3l this problem is also positively 
solved for subgame perfect secure equilibria, but only in the two-player case. Finally, in 
Section [H we study Problems [1] and [2] in the context of secure equilibria. We partially solve 
Problems [U by providing an algorithm that decides the existence of a secure equilibrium. 
And we positively solve Problem [2] for secure equilibria. 



1. Preliminaries 



1.1. Games, Strategy Profiles and Equilibria. In this paper, we distinguish between 
qualitative and quantitative games. In a qualitative game, each player has a qualitative 
objective, meaning that he wants to guarantee that some property holds. In this case, his 
payoff for a play of the game is either 1 or (the play does or does not satisfy the property, 
respectively). On the other hand, in a quantitative game, each player has a quantitative 
objective: he aims at minimizing (or maximizing) a certain value. His payoff for a play can 
then be a real number or ±oo. 

We consider here quantitative games played on a graph where all the players have 
reachability objectives. It means that, given a certain set of vertices Goalj, each player i 
wants to reach one of these vertices as soon as possible. We recall the basic notions about 
these games and we introduce different kinds of equilibria, like Nash equilibria. This section 
is inspired from [BBDP10J. 

Definition 1.1. An infinite turn-based multiplayer quantitative reachability game is a tuple 
Q = (U,V,(Vi) ieI i,E,(Goa\i)i eI i) where 

• IT is a finite set of players, 

• G = (V, (Vi)i£n,E) is a finite directed graph where V is the set of vertices, (Vi)i e u is a 
partition of V into the state sets of each player, E C V x V is the set of edges, such 
that for all v G V, there exists v' G V with (v,v') G E (i.e., each vertex has at least one 
outgoing edge), and 

• Goalj C V is the non-empty goal set of player i. 

From now on, we often use the term game to denote a multiplayer quantitative reachability 
game according to Definition ll.il 

It is often useful to specify an initial vertex v o G V for a game Q. We call the pair 
(Q,vq) an initialized game. Sometimes we omit the word "initialized" and just talk about 
games. The game (Q,vq) is played as follows. A token is first placed on the vertex vq. 
Player i, such that vq G Vi, has to choose one of the outgoing edges of vq and put the token 
on the vertex v\ reached when following this edge. Then, it is the turn of the player who 
owns v\. And so on. 
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A play p G V w (resp. a history h G V + ) of (G,vo) is an infinite (resp. a finite) path 
through the graph G starting from vertex vq. Note that a history is always non-empty 
because it starts with v$. The set H C V+ is made up of all the histories of Q, and for 
i G II, the set Hi is the set of all histories h £ H whose last vertex belongs to V*. 

For any play p = popi . . . of G, we define Costj(p) the cost of player i as: 



We note Cost(p) = (Costj(p))j G n the cosZ profile for the play p. Each player i aims to 
minimize the cost he has to pay, i.e. reach his goal set as soon as possible. The cost profile 
for a history h is defined similarly. 

A prefix (resp. proper prefix) a of a history h = ho ■ ■ ■ hk is a finite sequence ho ... hi, 
with Z < k (resp. I < k), denoted by a < h (resp. a < h). We similarly consider a prefix a 
of a play p, denoted by a < p. The function Last returns, given a history h = ho - ■ ■ hk, 
the last vertex hk of h, and the length \h\ of h is the number k of its edges. Note that the 
length is not defined as the number of vertices. Given a play p = popi . . ., we denote by p<\ 
the prefix of p of length I, i.e. p<\ = popi ■ ■ ■ pi- Similarly, p<; = poPi • • • Pl-i- 

We say that a play p = popi ■ ■ ■ visits a set S C V (resp. a vertex w G V) if there exists 
Z G N such that p\ is in S (resp. pi = v). The same terminology also stands for a history Zi. 
More precisely, we say that p visits a set S at (resp. before) depth d G N if is in S 1 (resp. 
if there exists Z < cZ such that p; is in S). For any play p we denote by Visit(/?) the set of 
players i G IT such that p visits Goalj. The set Visit(Zi) for a history Zi is defined similarly. 

A strategy of player i in £ is a function a : H \ — >■ V assigning to each history h G Hi, 
a next vertex cr(Zi) such that (Last(Zi), o~(h)) belongs to E. We say that a play p = popi . . . 
of Q is consistent with a strategy a of player z if pk+i = c(Po • • • Pk) f° r all Zc G N such 
that pk G Vj. The same terminology is used for a history h of Q. A strategy profile of £7 
is a tuple (cjj)j e n where o~i is a strategy for player i. It determines a unique play in the 
initialized game (Q,vq) consistent with each strategy Oi, called the outcome of (crj)j e n and 
denoted by ((<7j)j e n)u - We write cr_j for (o~i) ie ij\{j}, the set of strategies Oi for all the 
players except for player j. 

A strategy a of player i is memoryless if <r depends only on the current vertex, i.e. 
a(h) = cr(Last(Zi)) for all h G Hi. More generally, a is a finite-memory strategy if the 
equivalence relation « CT on defined by Zi Zi' if cr(Zi5) = a(h'S) for all <5 G Hi has finite 
index. In other words, a finite-memory strategy is a strategy that can be implemented by 
a finite automaton with output. A strategy profile (<7j)j e n is called memoryless or finite- 
memory if each Oi is a memoryless or a finite-memory strategy, respectively. 

For a strategy profile (cTj)j e n with outcome p and a strategy a'j of player j, we say that 
player j deviates from p if there exists a prefix h of p, consistent with a'j , such that h G isl- 
and o-j{h) / aj(h). 

We now introduce different notions of equilibria in the quantitative framework and give 
several examples to make clear the presented concepts. We first begin with the definition 
of Nash equilibrium. 

Definition 1.2. A strategy profile (crj)jen of a game {Q,vq) is a Nash equilibrium if for 
every player j G II and every strategy a'j of player j, we have: 




Z if Z is the least index such that p\ G Goalj 
+00 otherwise. 



(1.1) 



Costj(p) < Costj(p') 



where p = ((<7i) ien ) w and p' = (^,cr_j)„ . 
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This definition means that for all j E II, player j has no incentive to deviate since 
he cannot strictly decrease his cost when using a 1 - instead of o~j. Keeping notations of 
Definition 11.21 in mind, a strategy <r'- such that Costj(p) > Costj(p') is called a profitable 
deviation for player j w.r.t. (cTi)j<=n- In this case, either player j pays an infinite cost for p 
and a finite cost for p' (i.e. p' visits Goal.,-, but p does not), or player j pays a finite cost 
for p and a strictly lower cost for p' (i.e. p' visits GoaL for the first time earlier than p does). 

We now define the concept of secure equilibrium^ We first need to associate a binary 
relation -<j on cost profiles with each player j. Given two cost profiles (xi)ien and (yj), e n: 

(xi) ie u <j (2/i)ien iff [xj > yj)y 

(xj = yj A ( Mi E II Xi < yi) A (3i e U Xi < y;)) . 

We then say that player j prefers (yj)j e n to (x^^n. In other words, player j prefers a cost 
profile to another one either if he has a strictly lower cost, or if he keeps the same cost, the 
other players have a greater cost, and at least one has a strictly greater cost. 

Definition 1.3. A strategy profile (<7i)j e n of a game (<5,t>o) is a secure equilibrium if for 
every player j E II, there does not exist any strategy a'j of player j such that: 

Cost(/o) -<j Cost(//) 

where p = {{o-i) ie u)v and p' = (a'j,a-j) Vo . 

In other words, player j has no incentive to deviate w.r.t. relation -<j. A strategy a'j 
such that Cost(/?) -<j Cost(p') is called a -<j -profitable deviation for player j w.r.t. (<jj)i e n- 
Clearly, any secure equilibrium is a Nash equilibrium. 

In a secure equilibrium, each player tries first to minimize his own cost, and then to 
maximize the costs of the other players. According to [CHJ06] . a secure profile can be seen 
as a contract between the players which strengthens cooperation in the following sense: any 
unilateral selfish deviation by one player cannot put the other players at a disadvantage 
if they follow the contract. For more intuition and motivation about secure equilibria, see 

[(mjn6ilcTin7l[cpani . 

We now introduce a third type of equilibrium: the subgame perfect equilibrium. In this 
case, a strategy profile is not only required to be a Nash equilibrium from the initial vertex, 
but also after every possible initial history of the game. Before giving the definition, we 
introduce the concept of subgame and explain some notations. 

Given a game Q = (II, V, (Vi)i & u,E, (Goalj)j g n)) an initial vertex vo, and a history hv 
of (Q,vo), with v E V (h might be empty), the subgame (^1^,^) of (G,vo) with history hv 
is the game Q\h = (II, V, (V^jgrij E, (Goalj)j G n) initialized at v and such that the cost of a 
play 7r of (^1^,1") for player i is given by Costj(/i7r). Notice that the only difference between 
(G,vo) and (G\h,v) occurs in the costs of the plays. The cost for a play in the subgame 
(G\h,v) depends on the considered history h (the goal set Goalj could have already been 
visited by h). Given a strategy cr, for player i in Q, we define the strategy cr^ in Ql^ 
by &i\h(h') = CTi(hh') for all histories h! of (^1^,^) such that Last(Zi') E Vj. Let a be the 
strategy profile (<7j)j e n, we write a\h for (<Tj|/ l )j g n, and h(a\h) v for the play in (G,vo) with 
prefix h that is consistent with a\h from v. 



Our definition naturally extends the notion of secure equilibrium proposed in [CHJ06] to the quantitative 
framework. 
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Then, we say that (o"i|/i)ien is a Nash equilibrium in the subgame (G\h,v) if for every 
player j G IT and every strategy a'j of player j, we have that Costj(p) < Cost j(p'), where 
p = h((ai\h)i£n}v and p' = h{a'j\f l , a-j\h) v . The definition of a secure equilibrium in (Q\h,v) 
is given similarly. 

A subgame perfect equilibrium is a strategy profile that is a Nash equilibrium after 
every possible history of the game, i.e. in every subgame. In particular, a subgame perfect 
equilibrium is also a Nash equilibrium. 



Definition 1.4. A strategy profile (<Ji)ign of a game (Q, vq) is a subgame perfect equilibrium 
if for all histories hv of (<5,i>o), with v £ V, (<7i|h)ien is a Nash equilibrium in the subgame 
(G\h,v). 



We now introduce the last kind of equilibrium that we study. It is a new notion 
that combines both concepts of subgame perfect equilibrium and secure equilibrium in the 
following way. 

Definition 1.5. A strategy profile (o"j)j g n of a game (G,vo) is a subgame perfect secure 
equilibrium if for all histories hv of (G,vq), with v € V, (cri\h)ieii 1S a secure equilibrium in 
the subgame (G\h-> v )- 

Notice that a subgame perfect secure equilibrium is a secure equilibrium, as well as a 
subgame perfect equilibrium. 

In order to understand the differences between the various notions of equilibria, we 
provide three simple examples of games limited to two players and to finite trees. 

Example 1.6. Let G = (V, Vi, V%, E, Goali, Goa^) be the two-player game depicted in 
Fig. [TJ The vertices of player 1 (resp. 2) are represented by circles (resp. squares), that 
is, V\ = {A, D, E, F} and Vi = {B,C}. The initial vertex vq is A. The vertices of Goali 
are shaded whereas the vertices of Goal2 are doubly circled; thus Goali = {D,F} and 
Goal2 = {F} in G- The number 2 labeling the edge (B,D) is a shortcut to indicate that 
there are two consecutive edges from B to D (through one intermediate vertex). We will 
keep these conventions throughout the article. 




Figure 1: Game G- 




Figure 2: Game G' ■ 




© 

Figure 3: Game G" ■ 



In the games G, G' and G" of Fig. [Q [2] and [3] (played on the same graph), we define two 
strategies ai, a[ of player 1 and two stategies o%, a' 2 of player 2 in the following way: 
ai(A) = B, <t[(A) = C, a 2 {C) = E and a' 2 {C) = F. 

In (G,A), one can easily check that the strategy profile (01,02) is a secure equilibrium 
(and thus a Nash equilibrium) with cost profile is (3, +00). Such a secure equilibrium exists 
because player 2 threatens player 1 to go to vertex E in the case where vertex C is reached. 
This threat is not credible in this case since by acting this way, player 2 gets an infinite cost 
instead of a cost of 2 (that he could obtain by reaching F). For this reason, 02) is not a 
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subgame perfect equilibrium (and thus not a subgame perfect secure equilibrium). However, 
one can check that the strategy profile (0^,0 2 ) is a subgame perfect secure equilibrium. 

Let us now consider the game (Q', A) depicted in Fig. [2] (notice that the number 2 has 
disappeared from the edge (B,D), but Goali and Goal2 remain the same). One can verify 
that the strategy profile (cr'^o^) is a subgame perfect equilibrium which is not a secure 
equilibrium (and thus not a subgame perfect secure equilibrium). A subgame perfect secure 
equilibrium for [Q',A) is given by the strategy profile (01,02). 

Finally, for the game (Q",A) depicted in Fig. [3] (where Goali = {D,F} and Goa^ = 
{E,F}), one can check that the strategy profile (ai,a' 2 ) is both a subgame perfect equilib- 
rium and a secure equilibrium. However it is not a subgame perfect secure equilibrium. In 
particular, this shows that being a subgame perfect secure equilibrium is not equivalent to 
be a subgame perfect equilibrium and a secure equilibrium. On the other hand, (01,02) is 
a subgame perfect secure equilibrium in (Q",A). 

The general philosophy of our work is to investigate interesting concepts of equilibria in 
multiplayer quantitative reachability games. In these games, each player aims at reaching 
his goal set as soon as possible. Having that in mind, a play where a goal set is visited for 
the first time after cycles were no new goal set is visited does not seem to be a desirable 
behavior (see the definition of unnecessary cycle below). It appears thus reasonable to seek 
equilibrium concepts with outcomes that do not present this undesirable feature. 

Definition 1.7. Given a play p = a(3p in a game (G,vq), such that (3 is non-empty, 
Last(a) = Last(a/3), Visit(a) = Visit(a/3) and Visit(a) 7^ Visit(p), the cycle /3 is called an 

unnecessary cycle. 

Example 1.8. Let us exhibit an example of this phenomenon on the two-player game 
(Q, A) depicted in Fig. [4] (we use the same conventions as in Example 1 1 . 6H . For n > 0, let 
us consider the play A n B UJ . Along this play, the cycles A n ~ l , for n > 1, are unnecessary 
cycles. Indeed, once Goali is visited (in A), looping n times on A just delays the apparition 
of Goal2 (in B). However, for each n > 0, one can build a subgame perfect equilibrium 
(0^,02) whose outcome is A n B ul and cost profile is (0, n), as follows: 

a n, h \ = \ A {ih = A ^ with J < n ' 
1 1 B otherwise. 

This allows us to conclude that the notion of subgame perfect equilibrium does not prevent 
the existence of outcomes with unnecessary cycles. We can notice that (01,02) is not a 
secure equilibrium, for all n > 0. However, we will see in the next example that secure 
equilibria can also allow this kind of undesirable behaviors. 




B 



Figure 4: Subgame perfect equilibrium with Figure 5: Secure equilibrium with outcome 
outcome A n B w . A n BC w . 



Let us consider the game of Fig. [5] initialized at A. For n > 1, the cycles A n 1 are unnec- 
essary along the play A n BC u) . However, for each n > 0, we can build a secure equilibrium 
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(erf, whose outcome is A n BC ul and cost profile is (n + 1, n + 1), as follows: 



For each n > 0, the fact that (cx™, ) is a secure equilibrium is based on the following threat 
of player 2 against player 1: player 2 pretends that he will only decide to visit vertex C if 
player 1 has visited vertex A exactly n times. This behavior is not credible since player 2's 
interest is to reach vertex C as soon as possible. In other words, we have that (cf,^) is 
not a subgame perfect equilibrium (and thus not a subgame perfect secure equilibrium). 

Those examples motivate the introduction of the notion of subgame perfect secure 
equilibrium. We believe that this notion can help in avoiding the undesirable behaviors 
of unnecessary cycles. More generally, a deeper understanding of the studied equilibria 
whose outcomes have unnecessary cycles could be very useful. A more subtle example of a 
three-player game will be discussed in Example 14.91 

In the sequel, we study and partially solve Problem [T] and Problem [2l The next three 
sections contain useful material for the proofs of our results. 

1.2. Qualitative Two-player Zero-sum Games. In this section we recall well-known 
properties of qualitative two-player zero-sum games [Tho08] . They will be useful for our 
proofs, especially in the context of deviations of a player with respect to a strategy profile: 
we thus face a two-player zero-sum game where the player who deviates plays against the 
coalition of the other players. 

We first recall the notion of weak parity game. 

Definition 1.9. A qualitative two-player zero-sum weak parity game is a tuple Q = (V, Vi, V2, 
E, c) where 

• G = (V, Vi, V2, E) is a finite directed graph where V is the set of vertices, (Vi,^) is a 
partition of V into the vertex sets of player 1 and player 2, and E C V x V is the set of 
edges, 

• c : V — > N is the coloring function. 

Player 1 (resp. player 2) wins a play p = popi . . . E V u of the game Q if the maximum color 
in the sequence c(po)c(pi)c(p2) ... is even (resp. odd). 

Given an initial vertex vq £ V, the notions of play, history and strategy are the same 
as the ones defined in Section 11.11 The game is said zero-sum because every play is won by 
exactly one of the two players. 

In zero-sum games, it is interesting to know if one of the players can play in such a way 
that he is sure to win, however the other player plays. This is formalized with the notion of 
winning strategy. A strategy <Tj for player i is a winning strategy from an initial vertex v if 
all the plays of Q starting from v that are consistent with Uj are won by player i. If player i 
has a winning strategy in Q from v, we say that player i wins the game Q from v. We say 
that a game Q is determined if for all v G V, one of the two players has a winning strategy 
from v. 

Martin showed |Mar75| that every qualitative two-player zero-sum game with a Borel 
type winning condition is determined. In particular, we have the following proposition: 
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Proposition 1.10. [Tho08, Theorem 5] Let Q = (V, V\, V2, E, c) be a qualitative two-player 
zero-sum weak parity game. Then for all v G V , one of the two players has a memoryless 
winning strategy from v (in particular, Q is determined). 

We here consider three special cases of the weak parity condition: reachability, safety, 
and reachability under safety conditions. A qualitative two-player zero-sum reachability 
under safety game is denoted 

Q = {V,V l ,V 2 ,E,R,S) 

where R, S C V and R 7^ 0. In such a game, player 1 wins a play p iff p visits R (i.e., 
3i pi £ R) while staying in S (i.e., Vi p% £ S). The reachability under safety condition 
can be encoded with a weak parity condition by defining the coloring function c as follows: 
c(v) = 3 if v g" S, c(v) = 2 if v £ R and c(v) = 1 otherwise. Reachability games (resp. 
safety games) are special cases of reachability under safety games Q = (V, V\, V2, E, R, S) 
where S = V (resp. R = V). We can now state a corollary of Proposition II. 101 

Corollary 1.11. Let Q = (V, V\, V2, E, R, S) be a qualitative two-player zero-sum reacha- 
bility under safety game. Then the game Q is determined and player 1 has a memoryless 
strategy v\ that enables him to reach R within \V\ — 1 edges, while staying in S, from each 
vertex v from which he wins the game. 

In the sequel, we apply Corollary 1 1 . 1 1 1 on particular two-player games. Given a multi- 
player quantitative reachability game Q = (II, V, {Vi)i & n,E, (Goal.;)j g n) and a player i £ II, 
we denote by Qi = (V,Vi,V \ Vi, E, R, S) (or (Gi,R, S) in short) the qualitative two-player 
zero-sum reachability under safety game associated with player i. This game is played on 
the graph G t = (V,Vi,V \ Vi, E), where player i plays against the coalition of all the other 
players. Player i controls the vertices of Vi and the coalition those of V \ Vi; player i aims 
at reaching R while staying in S, and the coalition wants to prevent this. 

1.3. Unraveling. In the proofs of this article, it will be often useful to unravel the graph 
G = (V, (Vi)i(zu, E) from an initial vertex vq, which ends up in an infinite tree, denoted 
by T. This tree can be seen as a new graph where the set of vertices is the set H of histories 
of Q, the initial vertex is vo, and a pair (h, hv) € H x H is an edge of T if (Last(/i), v ) € E. 
A history h is a vertex of player i in T if h £ Hi, and h belongs to the goal set of player i 
if Last(/i) G Goalj. 

We denote by T the related game. This game T played on the unraveling T of G 
from vq is equivalent to the game (Q,vq) played on the graph G in the following sense. 
A play (po)(poPi)(poPiP2) ■ ■ ■ in T induces a unique play p = p pip 2 ■ ■ ■ in (G,v ), and 
conversely. Thus, we denote a play in T by the respective play in (G,vq). The bijection 
between plays of (G,vq>) and plays of T allows us to use the same cost function Cost, and 
to transform easily strategies in Q to strategies in T (and conversely). 

For practical reasons, we often consider equivalently T in our proofs instead of {Q,vq), 
and the equilibria defined in T are obviously equilibria in (G,vo). Moreover, figures given 
in proofs to help the understanding roughly represent the unraveling T of G and plays in 
game T ■ 

We also need to study the tree T limited to a certain depth d G N: we denote 
by Trunc^(T) the truncated tree of T of depth d and Trunc^(T) the finite game played 
on Trunc^(r). More precisely, the set of vertices of TrunCrf(T) is the set of histories h G H 
of length < d; the edges of Trunc^(T) are defined in the same way as for T, except that for 
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the histories h of length d, there exists no edge (h, hv). A play p in Truncd(T) corresponds 
to a history of {Q, vq) of length equal to d. The notions of cost and strategy are defined 
exactly like in the game T, but limited to the depth d. For instance, a player pays an 
infinite cost for a play p (of length d) if his goal set is not visited by p. 

1.4. Kuhn's Theorem. This section is devoted to the classical Kuhn's theorem [Kuh53] . 
It claims the existence of a subgame perfect equilibrium (resp. subgame perfect secure 
equilibrium) in multiplayer games played on finite trees. 

A preference relation is a total, reflexive and transitive binary relation. 

Theorem 1.12 (Kuhn's theorem). Let T be a finite tree and Q a game played on T. For 
each player i 6 II, let ^ be a preference relation on cost profiles. Then there exists a 
strategy profile (crj)j g n such that for every history hv of Q, every player j S II, and every 
strategy a'- of player j in Q, we have 

Cost(p') < s Cost(p) 

where p = h{(ai\ h ) i€ n)v and p' = h{a' j \ h , <7-j\ h ) v . 

One can easily be convinced that the binary relation on cost profiles used to define the 
notion of Nash equilibrium (see Definition 11.2)) is total, reflexive and transitive. We thus 
have the following corollary. 

Corollary 1.13. Let (Q, vq) be a game and T be the unraveling of G from vq. Let Truncrf(T) 

be the game played on the truncated tree ofT of depth d S N. Then there exists a subgame 
perfect equilibrium in Trunc,i(T). 

Let -<j be the relation defined by x ^ y iff x -<j y or x = y, where -<» is the relation 
used in Definition 11.31 We notice that in the two-player case, this relation is total, reflexive 
and transitive. However when there are more than two players, is no longer total. 
Nevertheless, it is proved in [LR09] that Kuhn's theorem remains true when <j is only 
transitive. So, the next corollary holds. 

Corollary 1.14. Let (Q, vq) be a game and T be the unraveling of G from v$. Let Truncd(T) 

be the game played on the truncated tree ofT of depth d G N. Then there exists a subgame 
perfect secure equilibrium in Truncd(7~). 

2. Existence of a Subgame Perfect Equilibrium 

In this section, we positively solve Problem [1] for subgame perfect equilibria. 

Theorem 2.1. In every multiplayer quantitative reachability game, there exists a subgame 
perfect equilibrium. 

The proof uses techniques completely different from the ones given in |BBDP10l BBDPllJ 
for the existence of Nash equilibria, and secure equilibria in two-player games. 

Let (Q, vq) be a game and T be the infinite game played on the unraveling T of G 
from vq. Kuhn's theorem (and in particular Corollary I1.13H guarantees the existence of a 
subgame perfect equilibrium in each finite game Trunc n (T) for every depth n E N. Given 
a sequence of such equilibria, the keypoint is to derive the existence of a subgame perfect 
equilibrium in the infinite game T. This is possible by the following lemma. 
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Lemma 2.2. Let (<r n )neN be a sequence of strategy profiles such that for every nfM, a n 
is a strategy profile in the truncated game Trunc n (7~). Then there exists a strategy profile 
a* in the game T with the property: 

Vd G N, 3n > d, a* and a n coincide on histories of length up to d. (2-1) 

Proof. This result is a direct consequence of the compactness of the set of infinite trees with 
bounded outdegree [Kec95| . An alternative proof is as follows. We give a tree structure, 
denoted by T, to the set of all strategy profiles in the games Trunc n (T), n G N: the nodes 
of r are the strategy profiles, and we draw an edge from a strategy profile a in Trunc n (T) 
to a strategy profile a' in Trunc n +i(7~) if and only if a is the restriction of a' to histories of 
length less than n. It means that the nodes at depth d correspond to strategy profiles of 
TrunCrf(T). We then consider the tree V derived from V where we only keep the nodes cr n , 
n G N, and their ancestors. Since V has finite outdegree, it has an infinite path by Konig's 
lemma. This path goes through infinitely many nodes that are ancestors of nodes in the 
set {o~ n ,n G N}. Therefore there exists a strategy profile cr* in the infinite game T (given 
by the previous infinite path in V) with property f|2. 1|) . □ 

Proof of Theorem \2.1\ Let Q = (II, V, (Vi)i e n, E, (Goalj)j g n) be a multiplayer quantitative 
reachability game, vq be an initial vertex, and T be the game played on the unraveling of 
G from vq. For all n G N, we consider the finite game Trunc n (T) and get a subgame perfect 
equilibrium a n = (<7™)j e n in this game by Corollary 11.131 According to Lemma \2.2\ there 
exists a strategy profile a* in the game T with property (|2.ip . 

It remains to show that a* is a subgame perfect equilibrium in T, and thus in (G, vq). Let 
hv be a history of the game (with v G V). We have to prove that a*\h is a Nash equilibrium 
in the subgame (T\h, v). As a contradiction, suppose that there exists a profitable deviation 
a'j for some player j G II w.r.t. a*\h in (T\h,v). This means that Costj(p) > Costj(p') for 
p = h(o-*\h) v and p' = h{a'-\h,a*_j\h)v, that is, p' visits Goalj for the first time at a certain 
depth d, such that \h\ < d < +oo, and p visits Goalj at a depth strictly greater than d (see 
Figure[6|). Thus: 

Cost j(p) > Cost j(p') = d. 





/hi \ 




/ Vu \ 




<fCo<>\ \ 




/ / tt' 7 






AT\ h ,v) I 







p' p 

Figure 6: The game T with its subgame (T\h,v). 

According to property (|2.ip . there exists n > d such that a* coincide with a n on histories 
of length up to d. It follows that for ir = h{a n \h) v and it' = h(a'j\h,a r ^j\h)v, we have that 
(see Figure [6]) 

Costj(7r') = Cost j(p') = d and Cost,-(7r) > d, 
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as 7r' and p' coincide up to depth d. And so, er'- is a profitable deviation for player j w.r.t. 
a n \h in (Trunc n (T)|ft, u), which leads to a contradiction with the fact that a n is a subgame 
perfect equilibrium in Trunc n (7~) by hypothesis. □ 

As an extension, we consider multiplayer quantitative reachability games with tuples of 
costs on edges (as in [B BDPlT] ). In these games, we assume that edges are labelled with 
tuples of strictly positive costs (one cost for each player). Here we do not only count the 
number of edges to reach the goal of a player, but we sum up his costs along the path until 
his goal is reached. His aim is still to minimize his global cost for a play. Let us give the 
formal definition. 

Definition 2.3. A multiplayer quantitative reachability game with tuples of costs on edges 
is a tuple Q = (II, V, (Vi) i( zu,E, (Costj) ie n, (Goali) ie n) where 

• n is a finite set of players, 

• G = (V, (Vi)i£n,E) is a finite directed graph where V is the set of vertices, (Vj)j g n is a 
partition of V into the state sets of each player, and E C V x V is the set of edges, such 
that for all v GV, there exists v' 6 V with (v, v') € E, 

• Costi : E — > M >0 is the cost function of player i defined on the edges of the graph, 

• Goalj C V is the non-empty goal set of player i. 

In this context, we adapt the definition of Costj(/?), the cost of player i for a play p = poPi ■ ■ ■ '■ 
i 

Costj((pfc_i, p*;)) if I is the least index such that pi G Goal,, ^ 2) 



Costj(p) 



k=l 

+00 otherwise. 



In this framework, we also prove the existence of a subgame perfect equilibrium. The proof 
is similar to the one of Theorem 12. 1\ the only difference lies in the choice of the considered 
depth d. 

Theorem 2.4. In every multiplayer quantitative reachability game with tuples of costs on 
edges, there exists a subgame perfect equilibrium. 

Let us introduce some notations that will be useful for the proof of this theorem. We 
define c m j n := mhijgn min ee £ Costj(e), c max := maxj S n max ee B Costj(e) and K := . It 

is clear that c m i n ,c max > and K > 1. 



Proof of Theorem \2.4\ Let Q = (EE, V, (Vi)i^n, E, (Costj)j e n> (Goalj)i e n) be a multiplayer 
quantitative reachability game with tuples of costs on edges, vo be an initial vertex, and 
T be the game played on the unraveling of G from vq. For all n £ N, we consider the 
finite game Trunc n (T) and get a subgame perfect equilibrium a n = (cr")i e n in this game by 
Corollary 11.131 According to Lemma 12.21 there exists a strategy profile a* in the game T 
with property (|2.ip . 

We then show that a* is a subgame perfect equilibrium in T, and thus in (G,vq)- Let 
hv be a history of the game (y E V). We have to prove that o~*\h is a Nash equilibrium in 
the subgame (T\h,v). As a contradiction, suppose that there exists a profitable deviation 
a'j for some player j G II w.r.t. a*\h in (T\h,v). This means that Costj(p) > Cost,(//) for 
p = h{a*\h) v and p' = h(a'j\h,cr i Lj\h)v Thus p' visits Goalj for the first time at a certain 
depth d' , such that \h\ < d' < +00. 



14 



T. BRIHAYE, V. BRUYERE, J. DE PRIL, AND H. GIMBERT 



We define some depth d depending on the fact that p visits Goalj or not. 



d 



m&x{d' ,d"} if p visits Goal^ for the first time at depth d", 
d' ■ K if p does not visit Goal.,. 

According to property (|2.ip . there exists n> d such that a* coincide with a n on histories 
of length up to d. For it = h(a n \h) v and it' = h(a'j\i- l ,cr''^j\i l ) v , since d > d' , it follows that: 

Costly) = Costj(p')- 

If p visits Goal.,, then it holds that Cost,(7r) = Costj(p) by definition of d, and so 
Costj(7r) > Cost^vr'). If p does not visit Goal^, then the following inequalities hold: 

Cost,-(7r') < d' ■ c max < d ■ c min < Cost,(7r). 

The first inequality comes from the fact that it 1 visits Goal., at depth d', the second one 
from the definition of d, and the last one from the fact that if ir visits Goalj, it must happen 
after depth d (as p does not visit Goal.,). 

In both cases Cost,(7r) > Cost,(7r'), and we conclude that a'j is a profitable deviation 
for player j w.r.t. a n \h in (Trunc n (T)|h> v ), which leads to a contradiction with the fact 
that a n is a subgame perfect equilibrium in Trunc n (7~) by hypothesis. □ 

Remark 2.5. We can transform the cost functions (Costj)jgn ( (PUJ) or ()2.2p ) of our games 
in the following way: for any player i and any play p, 

l-^i if Costi(p) = cel+ 
1 if Costj(p) = +oo. 

These new cost functions (Cost^)ien are bounded and continuous (in the product topology on 
V u ). Moreover, a subgame perfect equilibrium in a game with the cost functions (Costj)j e n 
is a subgame perfect equilibrium in this game with the new cost functions (Cost^ien, and 
conversely. Then, Theorems 12.11 and 12.41 are consequences of [Har85| TFL83| . 



Cost^/o) 



3. Existence of a Subgame Perfect Secure Equilibrium 

Regarding subgame perfect secure equilibria, we positively solve Problem Q] but only in the 
case of two-player games. 

Theorem 3.1. In every two-player quantitative reachability game, there exists a subgame 
perfect secure equilibrium. 

The main ideas of the proof are similar to the ones for Theorem 12.11 

Proof of Theorem \3.1[ Let Q = (II, V, V\, V2, E, Goali, Goa^) be a two-player quantitative 
reachability game, vq be an initial vertex, and T be the game played on the unraveling of G 
from vq. For every n G N, we consider the finite game Trunc n (T) and get a subgame perfect 
secure equilibrium a n = (c^,^) in this game by Corollary 11.141 According to Lemma 12.21 
there exists a strategy profile a* in the game T such that a* has property ()2.1[) . 

We show that a* = (c*,cr|) is a subgame perfect secure equilibrium in T. Let hv be 
a history of the game (v E V). We have to prove that <J*\h is a secure equilibrium in the 
subgame (T\h,v). As a contradiction, suppose that there exists a ^j-profitable deviation 
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a'j for some player j € {1,2} w.r.t. a*\h in (T\h,v). Let us assume w.l.o.g. that j = 1. As 
a*\h is a Nash equilibrium in (T\h,v) (see the proof of Theorem 12. ip . we know that 

Costi(p) = Costi(p') and Cost 2 (p) < Cost 2 (//) (3.1) 

where p = h{al\h, a\ \h) v and p' = h{a[ \h, o\ \h)v Thus it implies that Cost2(/?) is finite. Let 
d be the maximum between Costi(/)) and Cost2(p) if Costi(/o) is finite, or Cost2(p) otherwise. 
Remark that d > \h\. According to property (j2.ip . there exists n > d such that the strategy 
profiles a* and a n coincide on histories of length up to d. 

Let us show that a[ would then be a -<i-profitable deviation for player 1 w.r.t. a n \h in 
(Trunc n (T)|h, v). In this aim we first prove that 

Cost 2 (vr) < Cost 2 (7r') (3.2) 

where it = h{ai\h,(T2\h)v and ir' = h{a[\h, \h)v are finite plays in Trunc n (T) (see Fig. [7])- 
By definition of d and according to property ()2. 1 j) . we have that Cost2(vr) = Cost2(p) < d. 
If Cost2(//) = Cost2(7r'), Equation (|3.ip implies that Cost2(7r) < Cost2(7r'). Otherwise, we 
have that Cost2(vr') > d as p' and tt' coincide until depth d (by property (|2.ip ). and then 
Cost 2 (vr) < d < Cost 2 (7r'). 




Figure 7: The game T with its subgame (T\h,v). 

We now consider Costi(7r) and Costi(7r'). Let us study the next two cases. 

• If Costi(/>) < +oo, then we have that 

Costi(vr) = Costi(vr') (3.3) 

because Costi( / o / ) = Costi( / o) = Costi(7r) = Costi(7r') < d by Equation (|3.ip . prop- 
erty (|2.ip and definition of d. 

• If Costi(p) = +oo, then we show that Costi(-7r) = +oo, and as a consequence we get that 

Costi(vr) > Costi(Tr'). (3.4) 

As a contradiction suppose that Costi(-7r) < +oo. Consider vertex p^, the first vertex 
of p that belongs to Goab (we recall that Cost2(p) = d). Suppose that player 1 has 
a winning strategy to reach his goal from vertex p<i in the zero-sum reachability game 
Qi = (Gi,Goali,y) (as defined in Section LOI) . Then this contradicts the fact that a* 
is a subgame perfect equilibrium in T (see the proof of Theorem I2.ip . Therefore, by 
determinacy of Q\ (Corollary II. lip , player 2 has a winning strategy from vertex pd to 
prevent player 1 from reaching Goali. But in this case, this strategy is a ^-profitable 
deviation w.r.t. a n \h in (Trunc n (T)|/t, v), because player 2 can keep his cost while strictly 
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increasing player l's cost. This is impossible as a n is a subgame perfect secure equilibrium 

in Trunc n (T). Thus, we must have that Costi(7r) = +00. 
In all possible situations, we proved that a[ is a -^-profitable deviation for player 1 w.r.t. 
a n \h in (Trunc n (T)|h, v) because either Costi(7r) = Costi(7r') and Cost2(vr) < Cost2(7r'), or 
Costi(7r) > Costi(7r') (see (l3.2H3.4p ). So we get a contradiction with the fact that a n is a 
subgame perfect secure equilibrium in Trunc n (T) by hypothesis. □ 

Unfortunately the proof does not seem to extend to the multiplayer case. Indeed we 
face the same kind of problems encountered in [BBDPIO, BBDP11] . where the existence of 
secure equilibria is proved for two-player games and left open for multiplayer games. 

4. Decidability of the Existence of a Secure Equilibrium 

In this section, we study Problems [T] and [2] in the context of secure equilibria. Both problems 
have been positively solved in [BBDPIO] for two-player games only. To the best of our 
knowledge, the existence of secure equilibria in the multiplayer framework is still an open 
problem. We here provide an algorithm that decides the existence of a secure equilibrium. 
We also show that if there exists a secure equilibrium, then there exists one that is finite- 
memory. 

Theorem 4.1. In every multiplayer quantitative reachability game, one can decide whether 
there exists a secure equilibrium in ExpSpace. 

Theorem 4.2. If there exists a secure equilibrium in a multiplayer quantitative reachability 
game, then there exists one that is finite-memory. 

The proof of Theorem 14. II is inspired from ideas developed in [BBDPIO, BBDP TT] . The 
keypoint is to show that the existence of a secure equilibrium in a game (G, vq) is equivalent 
to the existence of a secure equilibrium (with two additional properties) in the finite game 
Truncrf(T) for a well-chosen depth d. The existence of the latter equilibrium is decidable. 
Notice that by Corollary 11.141 a secure equilibrium always exists in Trunc^(T); however we 
do not know if a secure equilibrium with the two required additional properties always exists 
in TrunQ(T). 

Let us formally introduce these two properties. The first one requires that the secure 
equilibrium is goal- optimized, meaning that all the goal sets visited along its outcome are 
visited for the first time before a certain given depth. For any game Q played on a graph 
with |V| vertices by | TT | players, we fix the following constant: d goa i(G) := 2 • |H| • \V\. 

Definition 4.3. Given a game (G,vq) and a strategy profile (<Ti)j e n in G, with outcome p, 
we say that (<7j)j G n is goal- optimized if and only if for all i € II such that Costj(p) < +00, 
we have that Costj(p) < d goa i(Q). 

The second property asks for a secure equilibrium that is deviation- optimized, meaning 
that whenever a player deviates from its outcome, he realizes within a certain given number 
of steps that his deviation is not profitable for him. 

Definition 4.4. Given a game (Q, Vq) and a secure equilibrium (<7j)j e n m G, with outcome 
p, we say that (<7j)j e n is deviation- optimized if and only if for every player j € II and every 
strategy a'- of player j, we have that 

Cost(p <dde J ^ Cost(p' <dde J, 
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where ddev = max{Costj(p) [ Costi(p) < +00} + \V\ and p' = (a'j,a-j) Vo . 

Remark that Definitions 14.31 and 14.41 extend to games Truncd(T) where d > d goa i{Q). 
We can now state the key proposition for proving Theorems 14.11 and 14.21 

Proposition 4.5. Let (Q, vq) be a game, and d = d goa i{Q) + 3 • \V\. 

(1) If there exists a goal- optimized and deviation- optimized secure equilibrium in 
Trunc ( i(T), then there exists a secure equilibrium in (G,vq) that is finite-memory. 

(2) If there exists a secure equilibrium in (Q,vq), then there exists a goal- optimized and 
deviation- optimized secure equilibrium in Trunc^(T). 

At this stage, it is difficult to give some intuition about the choice of the values d goa i{Q), 
ddev and d = d goa i{Q) + 3 • \V\. These values are linked to the proofs contained in this 
section. 

Proof of Theorem \4.1\ By Proposition 14. 5\ there exists a secure equilibrium in (Q,vo) iff 
there exists a goal-optimized and deviation-optimized secure equilibrium in Truncd(T), with 
d = d goa i{Q) + 3 • \V\. The latter property is decidable in NExpSpace (in \V\ and |LI|). 
Indeed, Truncd(T) has an exponential size. Guessing a strategy profile (cri)ign in this tree 
also needs an exponential size. Then we can test in exponential size whether (crj)j e n is 
a goal-optimized and deviation-optimized secure equilibrium in Truncrf(T). By Savitch's 
theorem, deciding the existence of a secure equilibria is thus in ExpSpace. □ 

Proof of Theorem \4-%\ This theorem is a direct consequence of Proposition 14.51 Indeed 
consider a secure equilibrium in a game (G,vq). We first apply Proposition 14.51 (Part (ii)) 
to this strategy profile to get a goal-optimized and deviation-optimized secure equilibrium 
(cj)ien in Trunc^T), for d = d goa i{Q) + 3 • |V|. Then we apply Proposition 14.51 (Part {%)) 
to the equilibrium (o"j)j e n, to get a finite-memory secure equilibrium back in (Q,vo). □ 

Let us remark that in Theorem 14.2} the finite-memory secure equilibrium is created 
from the one given by hypothesis and the construction is made in such a way that the set 
of players whose goal set is visited along the outcome is the same for both equilibria. 

The proof of Proposition 14.51 is long and technical. The next two sections are devoted 
to the two parts of this proposition. 

4.1. Part (i) of Proposition 14.51 This section is devoted to the proof of Proposition 14. 5} 
Part (i). We begin with a useful characterisation of a deviation-optimized secure equilib- 
rium. 

Lemma 4.6. With the previous notations of Definition ^. J\ a secure equilibrium (<Tj)j e n is 
deviation- optimized if and only if for every player j E II and every strategy o~j of player j , 
if 

(1) Costj-Oo) = CostjV), 

(2) Vi E II such that Costj(p) < +00, we have that Costj(p) < Costi(p'), 

(3) 3i E II Costj(p) < Costi(p'), 

then there exists I E LT such that Costi(p) = +00 and Cosfy(p') < d^ev 

Proof. Let us first assume that (crj)j G n is a deviation-optimized secure equilibrium whose 
outcome is denoted by p. Given any player j E LT, let cr' be a strategy fulfilling the 
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hypotheses of the lemma and p' the outcome given by (cr'-, (T-j) Vo ■ Let us denote respectively 
by (xj)ign and (yi)ien the cost profiles of the histories p<d dev and p' <( i d ■ 

Notice that by definition of d<fe„, Costj(p) = Xi for all i. For p', we have Costj(p') = yi 
provided Costj(//) < dd ev - Otherwise, it may happen that yi = +00 and Costj(//) < +00. 
So, it holds that Costi(p') < yi for all i. These observations will be often used in the sequel 
of the proof. 

Since (<7j)j g n is deviation-optimized, we have Cost(p <( i df , v ) -fij Cost(p' <dd ) meaning 
that: 

(xj < yj) A (xj + yj V (3i € II Xi > yi) V (Mi € n x { > y { )) . (4.1) 
By hypothesis (i), xj = yj. By hypothesis (Hi), we cannot have Mi £ U Xi > yi. Therefore 
to satisfy (|4.ip . there must exist a player i such that Xi > yi. If Costi(p) < +00, then 
by definition of ddev, Costj(p) = xi > yi = Costj(p') in contradiction with hypothesis (ii). 
Therefore Costi(p) = +00. From x,- t > yi, it follows that Costi(p') < d^v, which concludes 
the first implication of the proof. 

For the converse, let us now assume that (<7j)i g n is a secure equilibrium that fulfills the 
property stated in Lemma [4.61 We will prove that it is deviation-optimized, that is, for any 
player j € II, and any deviation a'- of player j, we have that Cost(p < ^ deii ) 7^ Cost(p' <dd ), 
with p = ((<7j)j e n) Uo and p' = (a'p(j-j) vo . By denoting respectively by (xj) ie n and (yijieu 
the cost profiles of p<d dev and p' <d , it is equivalent to prove (|4.ip . 

Since (<7j)i g n is a secure equilibrium, we know that a'- is not a ^-profitable deviation. 
In particular, player j can not strictly decrease his cost along p', and thus xj < yj. It 
remains to prove that the second conjunct of (j4. 1 j) is true. For this, we first show that 
as soon as one of the hypotheses among (i), (ii) or (Hi) is not fulfilled, this conjunct is 
satisfied. 

• If Cost j(p) < Cost j(p'), by choice of ddev, we also have that Xj < yj. Moreover, the case 
Costj(p) > Costj(p') is not possible as (<Tj)j G n is a secure equilibrium. 

• If there exists i 6 II such that Costj(/o) < +00 and Costj(p) > Cost^p'), then x-- L > yi. 

• If for all i € II, Costi(p) > Costj(yo'), we also have that Xi > yi, for all i. 

Thus the remaining deviations to consider fulfill hypotheses (i), (ii) and (Hi). In this case, 
there exists I € II such that Costi(p) = +00 and Cosfy(p') < dd ev - In particular we have 
that xi > y\, and the second conjunct of (|4.ip is true. □ 

The ideas of the proof for Part (i) of Proposition 14. 51 are as follows. Suppose that there 
exists a goal-optimized and deviation-optimized secure equilibrium ((Ji)i^a in Trunc^(T), 
for d = dg 0a i(Q) + 3 • \V\. To get from (<7j)j e n a finite-memory secure equilibrium in (Q, vq), 
we use a similar construction as [BBDP11, Proposition 25] where it is shown, in the context 
of two-player games, how to extend a secure equilibrium in a finite truncation of (Q, vq) to 
a secure equilibrium in (G,vo). The rough idea is as follows. Due to the hypotheses, the 
outcome tt of (c"j)j e n has a prefix a/3 such that all goal sets visited by it are already visited 
by a, and such that j3 is a cycle. The required secure equilibrium is specified such that its 
outcome is equal to a(3 u and any deviating player is punished by the coalition of the other 
players in a way that this deviation is not profitable for him. This secure equilibrium can 
be constructed in a way to be finite-memory. 

Proof of Proposition \4-5\ Part (i ). Let us set II = {1, ... , n). Let (r^jgn be a goal-optimized 
and deviation-optimized secure equilibrium in the game TrunQ(T) and tt its outcome. Since 
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M = dg 0a i{Q) + 3 • |V|, we can write 

7T = a/?7 with j3 non-empty 

Last(a) = Last(a/3) 

W > d goal (g) + \V\ 

\af3\<d goal (g) + 2-\V\. 

We have that Visit(a) = Visit(a/?7) (no new goal set is visited after a) because \a\ > d goa i(G) 
and (Tj)jen is goal-optimized. This enables us to use [BBDP11, Lemma 15] as follows. Let 
j G LI be such that a does not visit Goal.,-, and suppose that player j deviates from the history 
a. This lemma states that for all histories hv consistent with r~j and such that \hv\ < \a/3\, 
then the coalition formed by all the players i G LT \ {j} can play to prevent player j from 
reaching his goal set Goal.,- from vertex v. It means that this coalition has a memoryless 
winning strategy i/" • from vertex v in the zero-sum reachability game Qj = (Gj, Goalj, V) 
(see Corollary ll.lip . For each player i ^ j, let u?j be the memoryless strategy of player i 
in Q induced by v v _-. 

We define a finite-memory secure equilibrium in the game T using the same idea as in 
the proof of [BB DPll] Proposition 25] . The idea is to specify the required secure equilibrium 
as follows: each player i plays according to aj3 u (which is the outcome of this equilibrium) 
and punishes player j ^ i if he deviates from a/3 w , by playing according to 7$ until depth 
\a\, and after that, by playing arbitrarily if a visits Goalj, and according to vf j otherwise 
(where v is the vertex visited at depth \a\ when deviating). 

Formally we first need to specify a punishment function P. For the initial vertex vq, 
we define P(vq) = _L and for all histories hv € H such that h G Hi, we let: 

{_L if P(h) = 1 and hv < aft", 
i if P(h) = _L and hv £ a ft", 

P(h) otherwise (P(h) ^ _L). 

Then the definition of the secure equilibrium (o"j)j e n in T is as follows. For alii G II and 

h G Hi, 

v if P{h) = _L (h< a/3 w ); such that hv < ', 

arbitrary if P(/i) = i, 
Tj(/i) if P(/i) / JL,i and |/i| < \a\, 



Oi{h) 



i,P(h) 



(h) if P(h) 7^ JL,i, |/i| > |a|, a does not visit Goalp^; 



such that 3h'v < h with \h'v\ = \a\, 
arbitrary otherwise {P{h) ^ _L, i, \h\ > \a\ and a visits Goalp^)), 

where arbitrary means that the next vertex is chosen arbitrarily (in a memoryless way). 
Clearly the outcome of (cr^ign is the play a/3 w . 

Let us show that (aj)j e n is a secure equilibrium in the game T ■ Assume by contradiction 
that there exists a ^-profitable deviation a'j for player j w.r.t. (<Tj)j e n in T ■ Let rj be 
the strategy a'j restricted to TrunCrf(T). We are going to show that rj is a -^-profitable 
deviation for player j w.r.t. (Tj)j g n in Trunc,i(T), which is impossible by hypothesis. Here 
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are some useful notations: 

tt = {(Ti) i( zu)vo of cost profile (x 1 ,...,x n ) 

tt' = (Tj,T-j) Vo of cost profile (x[, . . . , x' n ) 

p = ((cji)i e n)^ of cost profile (y 1} . . . , y n ) 

p' = (a' j ,a-. j ) V0 of cost profile {y' x , . . . , y' n ). 

Notice that the play tt' coincide with the play p' at least until depth \a\ (by definition of 
rj and cr-j); they can differ afterwards. Clearly tt and p coincide at least until depth \af3\. 
The situation is depicted in Fig. 




Figure 8: Plays tt and p, and their respective deviations tt' and p'. 
As a'j is a ^-profitable deviation for player j w.r.t. (crj) ig n> we have that 

(yi,---,y n ) <j (y'i,---,y'n)- (4.2) 

Let us show that rj is a -^-profitable deviation for player j w.r.t. (rj)j g n, i.e., 

(xi , . . . , x n ) -Kj [xi , . . . , x n ). 
By (|4.2p . one of the next three cases stands. 

(1) y) < yj < +00. 

As p = and Visit(a) = Visit(a/?7), it means that a visits Goalj, and then yj = Xj. 
Since y'j < \a\, we also have x'j = y'j (as tt' and p' coincide until depth |a|). Therefore 

Xj "-C Xj ■ ellicl (^X\^ . . . j Xfi ) 3 * * * 

(2) yj. < yj = +00. 

If J/J < we have again x'j = y'j. Since Visit(a) = Visit(7r), it follows that Xj = 
yj = +00. Thus x'j < Xj, and so (x±, . . . , x n ) -<j (x[, . . . , x' n ). 

We show that the case y'j > \a\ is impossible. By definition of ct_j, the play p' is 
consistent with r_j until depth |a|, and then with v v _- from (as yj = +00). The 
play p' cannot visit Goal., after a depth > |a| by definition of v v _a. 
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(3) Vj = Up Vi G II yi < y- and 3 % e II y { < y[. 

The fact that yj = y'- implies = > ac^- (as Visit(a) = Visit(7r)). If x'a < Xj, then 
(x\ , . . . , x n ) -<j (x-^ , . . . , x n ). 

We show that the case = Xj is impossible. We can show that for all i € IT such 
that xi < +00, we have Xi < x[, and that there exists i € IT such that X{ < x\. Since 
(rj)jgn is deviation-optimized, Lemma [4.61 implies that there exists some I £ LI such that 
xi = +00, and x\ < d^ev = max{xj | x,- L < +00} + \V\. As (Ti)i g n is also goal-optimized, 
we have that dd ev < dgoai{G) + |V| < W\- As is consistent with r_j until depth |a|, it 
follows that y[ = x\ < yi = xi = +00. Thus case (3) is impossible. 
Therefore, each case is either impossible or shows that (xj)j g n ~<j (^)ien- This is in 
contradiction with (Tj)j g n being a secure equilibrium in Trunc^(T), and therefore, (<Tj)i e n 
is a secure equilibrium in T, thus in (Q,vo). 

It remains to show that (<7i)ign is a finite-memory strategy profile. This proof is very 
similar to the proof of [BBDP11, Proposition 25] and thus is not given in details. Roughly 
speaking, a finite amount of memory is enough to produce the outcome af3 u ] outside of 
this outcome it is enough to remember how (<7j)j G n is defined for histories up to length \a\ 
(after depth |a|, memoryless strategies are used). □ 

Remark 4.7. This proof shows in fact a little stronger result: if there exists a goal- 
optimized and deviation-optimized secure equilibrium in Trunc^T), then there exists a 
finite-memory secure equilibrium in (Q, vq) with the same cost profile. 

4.2. Part (it) of Proposition [4751 Part (ii) of Proposition 14.51 states that if there exists 
a secure equilibrium in a game (Q,vo), then there exists a goal-optimized and deviation- 
optimized secure equilibrium in Trunc^(T), for d = d goa i(Q) + 3 • \V\. The proof needs 
several steps. Suppose that there exists a secure equilibrium (o"i)i G n in (G,vo). The first 
step consists in transforming (<7j)i e n into a goal-optimized and deviation-optimized secure 
equilibrium in (Q,vo) (Proposition I4.8P ; the second step in showing that its restriction to 
TrunCrf(T) with d = d goa i{Q) + 2>- \V\ is still a goal-optimized and deviation-optimized secure 
equilibrium in Trunc^(T). 

Proposition 4.8. If there exists a secure equilibrium in a game (G,vo), then there exists 
one in (Q, Vq) which is goal- optimized and deviation-optimized. 

To get a goal-optimized equilibrium, the idea is to eliminate some unnecessary cycles 
(see Definition 1 1 . Tj> . Such an idea has already been developed in [BBDP11, Lemma 19] for 
Nash equilibria. Unfortunately, this lemma cannot be applied for secure equilibria (as shown 
in Example 14. 9p . Adapting it to the context of secure equilibria is not trivial, the underlying 
constructions are more involved: we need to modify the strategies of the coalition against a 
deviating player. By using specific punishing strategies for the coalitions, we are then able 
to get a goal-optimized equilibrium that is also deviation-optimized, due to the particular 
form of these strategies. 

Example 4.9. Consider the three-player game of Fig. [9] initialized at A, where V\ = 
{A,C,D}, V 2 = {B} and V 3 = 0, Goali = Goal 2 = {A} and Goal 3 = {£>}. The strategy 
profile (01, (T2, 03) defined^ below is a secure equilibrium whose outcome is ABCBD U and 



'The stategy 03 of player 3 has not to be defined as V3 = 0. 
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cost profile (0, 0, 5): 
a x {h) = 



iih = A or ABC, 
otherwise. 



<r 2 {h) 



iih = AB, 
otherwise. 



In Example 1 1.81 we gave two equilibria whose outcome has unnecessary cycles. Here, we also 
face such a situation, with the cycle BCB. If we modify (01, 02, 03) in order to remove this 
cycle, as done in [BBDP11, Lemma 19] for Nash equilibria, the resulting strategy profile is 
a Nash equilibrium with outcome ABD^ and cost profile (0, 0, 3), however it is no longer a 
secure equilibrium. Indeed player 1 has a -^-profitable deviation by taking the edge (A, D) 
instead of (A,B), which leads to a cost of 4 for player 3 (instead of 3). In the sequel we 
show how to modify the approach of [B BDPllt Lemma 19] in a way to keep the property 
of secure equilibrium. 




Figure 9: A three-player game with Goali = Goal2 = {^4} and Goal3 = {D}. 



In order to prove Proposition 14. 8( we need three lemmas: Lemmas 14. 1 1( l4~12l and 14,131 
Given a secure equilibrium, Lemma 14.111 describes some particular memoryless strategies 
for the coalition when a player deviates. Lemma 14.121 (counterpart of [BBDPTTJ Lemma 
19] for secure equilibria) states that we can remove a cycle from the outcome of a secure 
equilibrium, but the strategies have to be somewhat modified with these specific coalition 
strategies. This lemma is used in the proof of Proposition 14.81 to get a goal-optimized 
secure equilibrium. Lemma 14.131 states that we can also get a deviation-optimized secure 
equilibrium. 

Memoryless coalition strategies. Given a secure equilibrium in a game (G,vq), we here 
prove the existence of interesting memoryless strategies for the coalition against a deviating 
player. 

Let us first introduce the definition of a j -promising history for some deviating player j. 
Intuitively player j deviates from a strategy profile (0j)jgn and constructs a history h 
consistent with (J-j. This history h is called j-promising w.r.t. (0i)j<=n if player j does not 
know yet if this deviation will be -<j -profitable for him w.r.t. (0j)i g ri; but he can still hope 
that it will be, without knowing what he will play after h. 

Definition 4.10. Let (0i)i<=n be a strategy profile in a game (Q, Vq), with cost profile 
(a;j)j g n- Let us assume that II = {1, . . . , n} and 

X± < . . . < X k < Xk+l < ■■■ <x n 

where < k < n. Let h be a history of the game such that x^ < \h\ < x^+i- 

For any player j £ II, we say that h is j-promising w.r.t. (0j)ien if h is consistent with 
0_j and if 
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• in the case where xu+i < +00: 

— if j < k, we have that Cost j(h) = Xj and Mi G n Costj(/i) > Xi, 

— if j > k, we have that Cost j(h) = +00; 

• in the case where x^+i = +00: 

Costj(h) = xj, Vi G II Costj(/i) > xi and 3i G II Costj(/i) > Xj. 

In the case where x^+i < +00 and j < k, along h, player j has been able to get the 
same cost as along p (Cost j(h) = Xj) and to not decrease the cost of the other players 
(Costj(/i) > Xi). After h, he hopes to be able to play such that the resulting deviation hp' 
will satisfy (xj)j g n ~<j Cost(hp'). In the case where j > k, player j has not visited his goal 
set along h, so he does not know yet if his deviation will be -^-profitable for him. However 
he hopes to visit it early enough after h along hp', such that Cost j (hp') < Xj, or to get the 
same cost while increasing the cost of the other players in a way that (xj)j g n ~<j Cost(hp'). 

In the case where x^+i = +00, the history p<\h\ has visited all the goal sets Goalj such 
that Costj(p) < +00. Thus player j could have a -^-profitable deviation hp' if he can avoid 
visiting the goal sets Goalj, where i > k + 1 (i ^ j). 

Given a j-promising history h of player j, the next lemma describes the existence 
of interesting memoryless strategies of the coalition II \ {j} from the last vertex of h. 
This lemma uses some qualitative two-player zero-sum reachability under safety games 
Q-j = (G-j, R, S) associated with the coalition II\{j} (where G-j = (V, V \ Vj, Vj, E)). In 
such games, the coalition II \ {j} aims at reaching R while staying in S, and player j wants 
to prevent this. 

Lemma 4.11. Let (<Ti)j e ri be a secure equilibrium in a game (Q,vq), with outcome p and 
cost profile (xj)jgn- Let h be a j-promising history w.r.t. (o"i)ien for some player j G II. 
Let us assume w.l.o.g. that II = {1, . . . ,n}. // 

x\ < . . . < X}. < \h\ < \h\ + \V\ < Xfc+i < . . . < xi < xi + \ = . . . = x n = +00, 

where < k < I < n, then the coalition II \ {j} has a memoryless winning strategy p v _- 
from v = Last(/i) in the qualitative two-player zero-sum game G-j = (G-j,R,S) where 

• if j < k, then R = Uj>£;Goalj, and S = V, 

• if k < j < I, then R = V , and S = V \ Goalj, 

• if I < j and Cost(p < ^) -<j Cost(/i) ; then R = Uj>fcGoalj, and S = V \ Goalj, 

• tf I < 3 an d Cost(p<|^|) -£j Cost(/i), then R = V, and S = V \ Goalj. 

In this lemma, either all goal sets are visited by p and / = n, or / < n and the last visited 
goal set is Goal;. Also notice that R ^ in all cases. Indeed, k ^ n as h is j-promising, 
and then the set R in the case j < k of this lemma is not empty. In the third case, it is 
not empty either, otherwise we would have k + l = l + l = n = j but such a situation is 
impossible because h is j-promising w.r.t. (crj)j 6 n (see the last case of Definition 14. 1Q[) and 
(cj)ien is a secure equilibrium . 

Proof of Lemma \4-H\ By contradiction assume that the coalition II \ {j} has no winning 
strategy from v in the game Q-j = (G-j, R, S), i.e. no winning strategy from v to reach R 
while staying in S. By Corollary ll.ll[ it implies that player j has a memoryless winning 
strategy Pj from v to stay outside R or to reach V \ S. Recall that h is consistent with 
o-j as it is j-promising w.r.t. (<7j)j e n- Let p' be the play with prefix h that is consistent 
with o~-j , and with p v - from v (see Fig. [TO]) . In the four cases of the lemma, we then prove 
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that (xj)ign -\j (Costi(p'))i e n, meaning that player j has a -^-profitable deviation w.r.t. 
(ci)ien, which is impossible. 




Figure 10: Play p and its deviation p' with prefix h. 

• j <k. 

The strategy pj enables to avoid all goal sets Goal, where i > k. As h is j-promising, 
we have that Costj(h) = Xj and Mi G II, Costj(/i) > Xj. By construction of p' and as 
£fc < |^| < Xk+ii we have that 

Cost j(p') = Cost j(h) = xj, 

Vi < fe, Costj(p') > 

Vi > fe, Costj(p') = +00. 

Then for all i G II, we have that Costj(p') > Xj. It remains to show that the cost of one 
player is strictly increased in p' compared with p. In the case where x^+i < +oo, i.e. 
k < I, we have in particular that x; < +oo and Cost;(p') = +oo. And in the case where 
Xjfc + i = +oo (k = I), we have that (xj)j<=n ~<j Cost(/i) (by definition of j-promising), 
i.e. there exists z € IT such that Xj < Costj(/i). Either Costj(/i) = Costj(p') and then 
Xj < Costj(p'), or Costi(h) = +oo > Costj(p') and so Xj < \h\ < Costj(p'). In both cases, 
it implies that (xj)j en <j (Costj(p / )) i6 n- 

• k < j <l. 

As ^ is memoryless, this strategy enables player j to reach his goal set Goalj from v 
within \V\ steps. Thus, we have that 

Costj(p') < \h\ + \V\ < Xk+\ < xj 

since k < j <l, and so, (xj)j 6 n -\? (Costj(p'))ien- 

• I < j and Cost(/9<i/ l i) ^ Cost(/i). 

The strategy pj enables to avoid all goal sets Goalj where i > k and i ^ j, or to visit 
the goal set Goal-,-. On one hand, if p' visits Goaf,-, then 

CoStj(p') < +00 = Xj 

as j > I, and so, (xj)j G n <j (Costj(p'))j e n- On the other hand, if p' does not visit Goal^, 
then p' does not visit either any Goalj with i > k. Since Cost(p<|^|) <j Cost(/t), the 
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situation is quite similar to the first case, and we can deduce that 

Cost j(p') = Xj = +00, 
Mi < k, Costj(p') > xi, 
Vi > k, Costi(p') = +00. 

Thus, for all i 6 II, we have that Costj(p') > X{. Moreover, exactly like in the case 
j < k, we can show that there exists i € II such that X{ < Costj (//). Then it implies that 
(si)ien -<j (Costi(p')) ie u- 
• I < j and Cost(p<i/ l i) 2^ Cost(/t). 

Like in the second case, the strategy fj,j enables player j to reach his goal set Goalj 
from v. Then we have that 

Costj(p') < +00 = Xj 

and so, (xi) ie n -<j (Costj(p'))ien- □ 



Removing a cycle. The next lemma states that it is possible to modify the strategy profile 
of a secure equilibrium in a way to eliminate an unnecessary cycle in its outcome. In the 
notations of this lemma, notice that f3 is the eliminated cycle (condition Last(a) = Last(a/3)), 
notice also that a new goal set is visited after af3j (condition \/\s\t(p) ^ Visit(a)). The 
elimination of the cycle is possible by modifying the strategies of the coalitions into strategies 
as described in Lemma 14.111 

Lemma 4.12. Let (c^ign be a secure equilibrium in a game (G,vq), with outcome p. Sup- 
pose that p = a/3^fp, with (3 non-empty and \ j\ > \V\, such that 

Visit(a) = Visit(a/37) 

Visit(p) + Visit(a) 

Last(a) = Last(a/3). 

Then there exists a secure equilibrium (rj)j e n in (G,vq) with outcome ajp. 

Proof. Let (xj)j £ n be the cost profile of p. Let us assume w.l.o.g. that II = {1, . . . ,n} and 

xi < . . . < Xk < \a\ < \a/3j\ < Xk+i < ■ ■ ■ < xi < xi + i = . . . = x n = +00, 

where < k < I < n (remark that k < / as Visit(p) ^ Visit(a)). 

Let us define the required strategy profile (Tj)j g n with the aim to get the outcome a^/p 
by eliminating (3 in p. For all i € II and all histories h G Hi, we set 

o~i(af35) if h = a5, 
arbitrary if P(h) = i, 
n(h) := ^ l^ipfh)^ 1 ) if a ^ /i, P(h) ^ _L,? and 3h'v that is P(/i)-promising 

w.r.t. (o"i)i 6 n and verifies h'v < h and \h'v\ = \a\, 
o~i{h) otherwise, 

In this definition, arbitrary means that the next vertex is chosen arbitrarily, and the pun- 
ishment function P is defined as in the proof of Proposition 14. 5| Part (i) (adapted to the 
play ajp). Moreover, when a player j deviates, each player i ^ j plays according to <Tj, 
except in the case of a j-promising history h of length \a\ from which he plays according to 
p^Lj, with v = Last(/i) (see Lemma |4. 11 j) . Notation fiV. means the memoryless strategy of 
player i induced by p v _,. 
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We observe that the outcome of (rj)j g n is the play tt = ajp (see Fig. [TT1 and [T2j) . Let 
us write its cost profile as (yi, . . . , y n ). It follows that for all i € II, yi < X{. More precisely, 

- if i < k, then = xf, (4-3) 

- if k < i < I, then y A = x { - (|/3| + 1); (4.4) 

- if i > I, then y% = Xi = +oo. (4.5) 



p=((cr t ) tl zu)v p 2 



Figure 11: Play p. Figure 12: Play tt and possible deviations. 

Assume that there exists a -^-profitable deviation rj for player j w.r.t. (rj)j g n- Let 
tt' be the outcome of the strategy profile (rj,r_j) from vq, and (y[, . . . ,y' n ) its cost profile. 
Then we know that (yi, . . . , y n ) -<j (y[, . . . , y' n ). Two possible situations occur according 
to where player j deviates from tt. We show that the first situation is impossible. In the 
second one, we construct a -^-profitable deviation a'j for player j w.r.t. (crj)j 6 n 5 and then 
get a contradiction with (<Tj)j e n being a secure equilibrium. 

(i) player j deviates from tt strictly before depth |a| (see the play 7r^ in Fig. \12\) . 

Let us consider the prefix h of tt' of length [a|. We first state that h cannot visit Goalj 
in a way that Costj(h) < Xj, because h is consistent with cr_j (by definition of (rj)j e n)) 
and (c"i)i g n is a secure equilibrium. Therefore, h is a j-promising history w.r.t. (<7j)j e n, 
as Tj is a ^j-profitable deviation w.r.t. (rj)j g n- By definition of (rj)j g n, vr' is consistent 
with p v _- from v = Last(/i). We consider the four possible cases of Lemma 14.111 

• j < k. 

We have that yj = y'j. The coalition II \ {j} forces the play tt' to visit Goalj, for a 
certain i > k (let us recall that k < n), before depth \a\ + \ V\ as p?_j is memoryless. 
And so, y[ < \a\ + \V\ < \a\ + \j\ < y k+1 < yi (as \a/3j\ < x k+x and by Eq. (H3D). 
This contradicts the fact that (yi, . . . , y n ) <j (y[, . . . , y' n ). 

• k < j < I. 

The coalition II \ {j} prevents the play tt from visiting Goalj, and so, y'- = +oo. As 
yj < +oo, it cannot be the case that (yi, . . . , y n ) -<j (y' l5 . . . , y^). 

• I < j and Cost(/9<|/j|) Cost(/i). 

The coalition LI \ } forces the play tt' to visit Goalj, for a certain i > k, i ^ j , before 
depth |a| + |V|, while avoiding the visit of Goalj (then, yj = y'j = +oo). As in the 
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first case, this leads to a contradiction with the fact that (yi, . . . , y n ) -<j (y[, . . . , y' n ). 

• I < j and Cost(/9<|/j|) ^ Cost(/i). 

Like in the second case, the coalition II \ {j} prevents the play ir' from visiting Goalj, 
and so, yj = y'- = +00. Moreover, the hypothesis Cost( / o<| /l |) Cost(/i) implies that 
(yi, • • • , Vn) <j (y'i, ■ ■ ■ , y'n) cannot be true. 
(ii) player j deviates from ir after depth \a\ (tt and %' coincide at least on a, see the play ir' 2 
inFig.CQ]). 

We define for all histories h £ Hj: 

(Jj(h) if a/3 % h, 
Tj(a5) if h = a/35. 

Let us set p' = {a'j,cr-j) Vo of cost profile (a^, . . . , x' n ). As player j deviates after a with 
the strategy rj, one can prove that 

tt' = an' and p' = a/3-ir' 

by definition of (Ti)j g n (see the play p' 2 in Fig. [TT]) . Since Visit(a) = Visit(a/3), Equa- 
tions (14.3j) . (14. 4p and (14. 5j) also stand by replacing xi with x\ and 7/j with (but the 
value of / might be different). Then 

(xi, . . . ,x n ) -<j (xi, . .. ,x' n ) iff (yi, . . . ,y„) -<j (yi, . . . ,y' n ), 

which proves that a'j is a ^j-profitable deviation for player j w.r.t. (<Tj)j e rii and this is 
a contradiction. □ 



Goal- and deviation- optimized secure equilibrium. The next lemma uses the ideas devel- 
oped in the proof of Lemma 14.121 to show that any secure equilibrium can be transformed 
into one that is deviation-optimized. It is the last step before proving Proposition 14. 8[ and 
finally Part (ii) of Proposition 14.51 

Lemma 4.13. Let (<7j)jgn be a secure equilibrium in a game (Q,vq), with outcome p. Then 
there exists a deviation- optimized secure equilibrium (rj)i g n in (G,vo) with outcome p. 

Proof. Let a be the prefix of p of length max{Costj(p) | Costj(p) < +00}. It follows that 
Visit(p) = Visit(a). Then we define the required strategy profile (rj)j g n exactly like in the 
proof of Lemma 14.121 We only remove the first line of the definition: Ti(h) = ai(a/35) if 
h = a5. One can be convinced that (rj)j G n and (<7j)j e n have the same outcome p. We prove 
in the exact same way that (Tj)i e n is a secure equilibrium in (G,vo) (here, k = I). 

Let us now show that (rj)j g n is deviation-optimized thanks to Lemma 14.61 Let rj be a 
strategy of some player j such that the play p' = (-rj, r_j) vo verifies 

(i) Costj(p) = Cost j (p'), 

(ii) Vi G II such that Costi(p) < +00, we have that Costj(p) < Costj(/?'), 

(iii) 3i e II Costi(p) < Cost; (p r ). 

We must prove that there exists / such that Cost;(p) = +00 and Costy(//) < ddev = 
max{Costj(/9) | Costi(p) < +00} + |V|. Notice that Cost(p) = Cost(a). 
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On one hand, suppose that Cost(o) Cost(p' < | Q ,|). By (i), (ii) and (in), the only- 
possibility is to have some I such that Costy(a) = +00 and Cost^p^^) < +00, that is, 
Cost/(p) = +00 and Cost^(p') < \a\ < ddev 

On the other hand, if Cost(o) -<j Cost(p' < i|), then according to the last case of Def- 
inition [4T0l P<i a i is j-promising w.r.t. (o"j)j £ ri. Indeed, p<i a i is consistent with <r_j, and 
there exists i € LI such that Costj(p) = +00 (otherwise it would contradict the fact that 
(ci)ien is a secure equilibrium). By definition of (rj)j g n, p' is thus consistent with \x v _- from 
vertex v = p',,. Thus, by Lemma 14.111 (first case or third case), there exists I such that 
Costj(p) = +00 and Costy(p') < |q[ + \V\ = dd ev (as p, v _^ is memoryless). 

In both cases, by Lemma |4~6| we proved that (rj)ien is deviation-optimized. □ 

We are now able to prove Proposition 14.81 which states that if there exists a secure equi- 
librium in a game (Q, Vq), then there exists one which is goal-optimized and deviation- 
optimized. 

Proof of Proposition \4~l8\ Let (oi)j e n be a secure equilibrium in (G,vq) with outcome p = 
((crj)j g n)i;o and cost profile (a^)i 6 n- Let us assume w.l.o.g. that LT = {1, . . . , n} and 

xi < . . . < xi < xi+i = ... = x n = +00 

where < I < n. Let us set xq = 0. For all k € {0, 1, . . . , I — 1} such that (xk+i — Xk) > 2 - 1 V | 
and while it is still the case, we apply the following procedure to get a goal-optimized secure 
equilibrium. 

Consider such a A; € {0, 1, — 1}. Then, we can write p = afi'jp, with f3 non-empty, 
|t| > and such that 

Xk < < x k+ i 

Visit(a) = Visit(a/?7) = {l,...,k} 

Last(a) = Last(a/3). 

Let us remark that Visit(p) 7^ Visit(a) as k < I. By Lemma 14.121 there exists a secure 
equilibrium in (Q, vo) with outcome a^/p. Its cost profile (yi)ien is such that 

yi = xi,... ,yk = x k ; 

Vk+i < xk+i, ■■■ ,yi <xv, 

yi+l = xi + i = +00, . . . ,y n = x n = +00. 

By applying finitely many times this procedure, we can assume w.l.o.g. that (<Tj)j g n is a 
secure equilibrium with a cost profile (x\, . . . , x n ) such that 

Xi < i ■ 2 ■ \ V\ for i < I 
Xi = +00 for i > I, 

meaning that (<Tj)i e n is a goal-optimized secure equilibrium. 

Moreover, by Lemma 14.131 there exists a deviation-optimized secure equilibrium with 
the same outcome, i.e. a goal-optimized and deviation-optimized secure equilibrium. And 
this concludes the proof. □ 
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Remark 4.14. Regarding the costs, this proof shows that if there exists a secure equilibrium 
with cost profile (aj)j G n in a game (G,vq), then there exists a goal-optimized and deviation- 
optimized secure equilibrium with cost profile (&i)ien i n (G,vq), such that for all i G II, 
bi < a%. In particular, the cost profile is usually not preserved. 

Finally, on the basis of Proposition [THl we are able to prove Part (ii) of Proposition 14. 51 
given a game (Q, vq), if there exists a secure equilibrium in (Q, vq), then there exists a goal- 
optimized and deviation-optimized secure equilibrium in Trunc^T), for d = d goa i(Q)+3-\V\. 

Proof of Proposition \J75\ Part (ii). Let (c7j)jgn be a secure equilibrium in (Q,vo) with out- 
come p. By Proposition 14.81 we can suppose w.l.o.g. that (oi)j e n is goal-optimized and 
deviation-optimized. Let us define the strategy profile (rj)j G n in TrunCrf(T) as the strategy 
profile (<7j)j g ri restricted to the finite tree TrunCrf(T). We prove that (rj)j e n is a secure 
equilibrium in TrunCrf(T), which is clearly goal-optimized (d > d goa i(Q)). 

For a contradiction, assume that player j has a -^-profitable deviation rj w.r.t. (r^jgn- 
Let us denote tt = ((r^ign)^ and tt' = (rj,r_j) Vo in TrunCd(T). We extend arbitrarily t- 
in T, into a strategy denoted a'j, and let p' = (a'j,a-j) Vo . Let us remark that tt (resp. tt') 
is a prefix of p (resp. p') of length d > d goa i(Q), and thus, in particular Cost(p) = Cost(7r). 
Moreover, it is impossible that Costj(-7r) > Costj(7r'), otherwise we would have Costj(p) > 
Cost j(p') and so, get a contradiction with the fact that (<7j)ign is a secure equilibrium in T. 
Then, player j gets the same cost Costj(7r) = Costj(7r') and 

ViGlI Costj(7r) < Costi(7r') A 3i € II Costi(7r) < Costi(7r'). 

We now show that Cost j(p) = Cost j(p'). In the case where Cost,-(-7r) = Costj(7r') = +oo 
(= Costj(p)), we must have Cost j(p') = +oo. Otherwise, it would contradict the fact that 
(ci)ign is a secure equilibrium in T. In the case where Costj(7r) = Costj(7r') < +oo, then 
Cost j(p) = Cost j(p') (as tt and tt' are prefixes of p and p' respectively). Moreover, since rj is a 
^-profitable deviation w.r.t. (rj)j e n, it follows that for alii G II such that Costj(p) < +oo, 
we have that Costj(p) < Costi(p'), and there exists i G II such that Costj(p) < Costj (/?'). As 
(cjj)jgn is deviation-optimized, Lemma 14.61 implies that there exists some I E II such that 
Costj(p) = +oo and Cost/(p') < d^ ev = max{Costj(/3) | Costj(p) < +00} + |V|. As dd ev < 
d g0 ai{G) + \V\ < d, we have that Cosfy(7r) = Costi(p) = +00 and Cost;(7r') = Costy(p') < ddev 
This gives a contradiction with the fact that t'- is a ^-profitable deviation w.r.t. (rj)j e n in 
Truncrf(T). Therefore, (rj)i e n is a secure equilibrium in this game. On the other hand, the 
previous argument also shows that (rj)j € n is deviation-optimized. □ 

Remark 4.15. This proof shows in particular that if there exists a goal-optimized and 
deviation-optimized secure equilibrium in (Q,Vo), then there exists a goal-optimized and 
deviation-optimized secure equilibrium in Truncd(T) with the same cost profile. Together 
with Remark 14.141 we then proved the following result: if there exists a secure equilib- 
rium with cost profile (aj)jgn in (G,vo), then there exists a goal-optimized and deviation- 
optimized secure equilibrium with cost profile (6j)ign i n TrunCd(T), such that for all i € II, 
bi < at. 

Remarks 14.71 and 14.151 imply the proposition below. 

Proposition 4.16. Given a multiplayer quantitative reachability game and a tuple of thresh- 
olds (U)i & u £ (KU {+oo}) n , one can decide in ExpSpace whether there exists a secure 
equilibrium with cost profile (cj)ien such that for all i G II, c, < t$. 
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The decision problem related to Proposition 14,161 is equivalent to decide whether there 
exists a goal-optimized and deviation-optimized secure equilibrium with cost profile (oj)ien 
in TrunCrf(T) where d = d goa i{Q) + 3 • \V\, such that for all i G II, a,{ < U. Notice that d 
does not depend on (fj)ign- 

5. Conclusion and Perspectives 

In this paper, we study the concept of subgame perfect equilibrium, a refinement of Nash 
equilibrium well-suited to the framework of games played on graphs. We also introduce the 
new concept of subgame perfect secure equilibrium. We prove the existence of subgame 
perfect equilibria in multiplayer quantitative reachability games. We also prove the exis- 
tence of subgame perfect secure equilibria, but only in the two-player framework. Finally, 
we provide an algorithm deciding in ExpSpace the existence of secure equilibria in the 
multiplayer case. On the one hand, the first two results have been obtained by topological 
techniques, that are completely different from the techniques used in [ BBDPlOt IBBDP11] . 
On the other hand, proofs of the last result are strongly inspired by proofs developed in 
these references, but have required new ideas about the coalition strategies. 

There are several interesting directions for future research. We are currently working 
on the model of quantitative game, enriched by allowing n-tuples of positive weights on 
edges (see Theorem 12. 4p . We do believe that our results remain true in this context. The 
case of Nash equilibria is already treated in [BBDPllj. Notice that our results trivially 
generalize to the particular case where the weights of the edges are of the form (c, . . . , c) 
with c € No- Indeed it is enough to replace each such edge by a path of length c composed 
of c new edges (of cost 1). 

To the best of our knowledge, the existence of secure equilibria in the multi-player 
framework is still an open problem. We prove that the existence of a secure equilibrium in 
an infinite game is equivalent to the existence of a goal-optimized and deviation-optimized 
secure equilibrium in a finite game. This open problem could be positively solved if Corol- 
lary 11.141 could be adapted in a way to get a goal-optimized and deviation-optimized secure 
equilibrium in the finite game, and then by applying Proposition 14. 51 A deeper understand- 
ing of equilibria with unnecessary cycles could also be helpful. For the moment, we are not 
able to solve this problem with more than two players. The same kind of question is also 
open for subgame perfect secure equilibria. 

Another research direction concerns a deeper study of the memory needed in the differ- 
ent kinds of equilibria. In the case of subgame perfect equilibria and subgame perfect secure 
equilibria, the topological techniques give no results on the memory needed. However, in 
the case of secure equilibria, we prove that we can limit to finite-memory equilibria. 
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